Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Alerts

pregnant is highly overall correlated with ageHigh correlation
skin_thickness is highly overall correlated with insulinHigh correlation
insulin is highly overall correlated with skin_thicknessHigh correlation
age is highly overall correlated with pregnantHigh correlation
bp is highly overall correlated with bmiHigh correlation
bmi is highly overall correlated with bpHigh correlation
pregnant has 111 (14.5%) zerosZeros
bp has 35 (4.6%) zerosZeros
skin_thickness has 227 (29.6%) zerosZeros
insulin has 374 (48.7%) zerosZeros
bmi has 11 (1.4%) zerosZeros

Reproduction

Analysis started2022-12-04 04:06:13.174684
Analysis finished2022-12-04 04:06:30.439442
Duration17.26 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

pregnant
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8450521
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:30.592346image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.3695781
Coefficient of variation (CV)0.87634133
Kurtosis0.15921978
Mean3.8450521
Median Absolute Deviation (MAD)2
Skewness0.90167398
Sum2953
Variance11.354056
MonotonicityNot monotonic
2022-12-04T09:36:30.822383image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 135
17.6%
0 111
14.5%
2 103
13.4%
3 75
9.8%
4 68
8.9%
5 57
7.4%
6 50
 
6.5%
7 45
 
5.9%
8 38
 
4.9%
9 28
 
3.6%
Other values (7) 58
7.6%
ValueCountFrequency (%)
0 111
14.5%
1 135
17.6%
2 103
13.4%
3 75
9.8%
4 68
8.9%
5 57
7.4%
6 50
 
6.5%
7 45
 
5.9%
8 38
 
4.9%
9 28
 
3.6%
ValueCountFrequency (%)
17 1
 
0.1%
15 1
 
0.1%
14 2
 
0.3%
13 10
 
1.3%
12 9
 
1.2%
11 11
 
1.4%
10 24
3.1%
9 28
3.6%
8 38
4.9%
7 45
5.9%

plasma
Real number (ℝ)

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.89453
Minimum0
Maximum199
Zeros5
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:31.136209image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3140.25
95-th percentile181
Maximum199
Range199
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation31.972618
Coefficient of variation (CV)0.26446703
Kurtosis0.64077982
Mean120.89453
Median Absolute Deviation (MAD)20
Skewness0.1737535
Sum92847
Variance1022.2483
MonotonicityNot monotonic
2022-12-04T09:36:31.450690image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
99 17
 
2.2%
100 17
 
2.2%
111 14
 
1.8%
129 14
 
1.8%
125 14
 
1.8%
106 14
 
1.8%
112 13
 
1.7%
108 13
 
1.7%
95 13
 
1.7%
105 13
 
1.7%
Other values (126) 626
81.5%
ValueCountFrequency (%)
0 5
0.7%
44 1
 
0.1%
56 1
 
0.1%
57 2
 
0.3%
61 1
 
0.1%
62 1
 
0.1%
65 1
 
0.1%
67 1
 
0.1%
68 3
0.4%
71 4
0.5%
ValueCountFrequency (%)
199 1
 
0.1%
198 1
 
0.1%
197 4
0.5%
196 3
0.4%
195 2
0.3%
194 3
0.4%
193 2
0.3%
191 1
 
0.1%
190 1
 
0.1%
189 4
0.5%

bp
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct47
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.105469
Minimum0
Maximum122
Zeros35
Zeros (%)4.6%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:32.345737image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile38.7
Q162
median72
Q380
95-th percentile90
Maximum122
Range122
Interquartile range (IQR)18

Descriptive statistics

Standard deviation19.355807
Coefficient of variation (CV)0.28009082
Kurtosis5.1801566
Mean69.105469
Median Absolute Deviation (MAD)8
Skewness-1.843608
Sum53073
Variance374.64727
MonotonicityNot monotonic
2022-12-04T09:36:32.658228image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
70 57
 
7.4%
74 52
 
6.8%
78 45
 
5.9%
68 45
 
5.9%
72 44
 
5.7%
64 43
 
5.6%
80 40
 
5.2%
76 39
 
5.1%
60 37
 
4.8%
0 35
 
4.6%
Other values (37) 331
43.1%
ValueCountFrequency (%)
0 35
4.6%
24 1
 
0.1%
30 2
 
0.3%
38 1
 
0.1%
40 1
 
0.1%
44 4
 
0.5%
46 2
 
0.3%
48 5
 
0.7%
50 13
 
1.7%
52 11
 
1.4%
ValueCountFrequency (%)
122 1
 
0.1%
114 1
 
0.1%
110 3
0.4%
108 2
0.3%
106 3
0.4%
104 2
0.3%
102 1
 
0.1%
100 3
0.4%
98 3
0.4%
96 4
0.5%

skin_thickness
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct51
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.536458
Minimum0
Maximum99
Zeros227
Zeros (%)29.6%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:32.955093image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q332
95-th percentile44
Maximum99
Range99
Interquartile range (IQR)32

Descriptive statistics

Standard deviation15.952218
Coefficient of variation (CV)0.77677549
Kurtosis-0.52007187
Mean20.536458
Median Absolute Deviation (MAD)12
Skewness0.1093725
Sum15772
Variance254.47325
MonotonicityNot monotonic
2022-12-04T09:36:33.257089image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 227
29.6%
32 31
 
4.0%
30 27
 
3.5%
27 23
 
3.0%
23 22
 
2.9%
33 20
 
2.6%
28 20
 
2.6%
18 20
 
2.6%
31 19
 
2.5%
19 18
 
2.3%
Other values (41) 341
44.4%
ValueCountFrequency (%)
0 227
29.6%
7 2
 
0.3%
8 2
 
0.3%
10 5
 
0.7%
11 6
 
0.8%
12 7
 
0.9%
13 11
 
1.4%
14 6
 
0.8%
15 14
 
1.8%
16 6
 
0.8%
ValueCountFrequency (%)
99 1
 
0.1%
63 1
 
0.1%
60 1
 
0.1%
56 1
 
0.1%
54 2
0.3%
52 2
0.3%
51 1
 
0.1%
50 3
0.4%
49 3
0.4%
48 4
0.5%

insulin
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct186
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.799479
Minimum0
Maximum846
Zeros374
Zeros (%)48.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:33.553953image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.244
Coefficient of variation (CV)1.4441699
Kurtosis7.2142596
Mean79.799479
Median Absolute Deviation (MAD)30.5
Skewness2.2722509
Sum61286
Variance13281.18
MonotonicityNot monotonic
2022-12-04T09:36:33.819569image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 374
48.7%
105 11
 
1.4%
130 9
 
1.2%
140 9
 
1.2%
120 8
 
1.0%
94 7
 
0.9%
180 7
 
0.9%
100 7
 
0.9%
135 6
 
0.8%
115 6
 
0.8%
Other values (176) 324
42.2%
ValueCountFrequency (%)
0 374
48.7%
14 1
 
0.1%
15 1
 
0.1%
16 1
 
0.1%
18 2
 
0.3%
22 1
 
0.1%
23 2
 
0.3%
25 1
 
0.1%
29 1
 
0.1%
32 1
 
0.1%
ValueCountFrequency (%)
846 1
0.1%
744 1
0.1%
680 1
0.1%
600 1
0.1%
579 1
0.1%
545 1
0.1%
543 1
0.1%
540 1
0.1%
510 1
0.1%
495 2
0.3%

bmi
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.992578
Minimum0
Maximum67.1
Zeros11
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:34.136393image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21.8
Q127.3
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range67.1
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation7.8841603
Coefficient of variation (CV)0.24643717
Kurtosis3.2904429
Mean31.992578
Median Absolute Deviation (MAD)4.6
Skewness-0.42898159
Sum24570.3
Variance62.159984
MonotonicityNot monotonic
2022-12-04T09:36:34.495756image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32 13
 
1.7%
31.6 12
 
1.6%
31.2 12
 
1.6%
0 11
 
1.4%
32.4 10
 
1.3%
33.3 10
 
1.3%
30.1 9
 
1.2%
32.8 9
 
1.2%
32.9 9
 
1.2%
30.8 9
 
1.2%
Other values (238) 664
86.5%
ValueCountFrequency (%)
0 11
1.4%
18.2 3
 
0.4%
18.4 1
 
0.1%
19.1 1
 
0.1%
19.3 1
 
0.1%
19.4 1
 
0.1%
19.5 2
 
0.3%
19.6 3
 
0.4%
19.9 1
 
0.1%
20 1
 
0.1%
ValueCountFrequency (%)
67.1 1
0.1%
59.4 1
0.1%
57.3 1
0.1%
55 1
0.1%
53.2 1
0.1%
52.9 1
0.1%
52.3 2
0.3%
50 1
0.1%
49.7 1
0.1%
49.6 1
0.1%
Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:34.813837image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.3313286
Coefficient of variation (CV)0.70215138
Kurtosis5.5949535
Mean0.4718763
Median Absolute Deviation (MAD)0.1675
Skewness1.9199111
Sum362.401
Variance0.10977864
MonotonicityNot monotonic
2022-12-04T09:36:35.173198image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.258 6
 
0.8%
0.254 6
 
0.8%
0.268 5
 
0.7%
0.207 5
 
0.7%
0.261 5
 
0.7%
0.259 5
 
0.7%
0.238 5
 
0.7%
0.19 4
 
0.5%
0.263 4
 
0.5%
0.299 4
 
0.5%
Other values (507) 719
93.6%
ValueCountFrequency (%)
0.078 1
0.1%
0.084 1
0.1%
0.085 2
0.3%
0.088 2
0.3%
0.089 1
0.1%
0.092 1
0.1%
0.096 1
0.1%
0.1 1
0.1%
0.101 1
0.1%
0.102 1
0.1%
ValueCountFrequency (%)
2.42 1
0.1%
2.329 1
0.1%
2.288 1
0.1%
2.137 1
0.1%
1.893 1
0.1%
1.781 1
0.1%
1.731 1
0.1%
1.699 1
0.1%
1.698 1
0.1%
1.6 1
0.1%

age
Real number (ℝ)

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.240885
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-12-04T09:36:35.579434image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.760232
Coefficient of variation (CV)0.35378816
Kurtosis0.64315889
Mean33.240885
Median Absolute Deviation (MAD)7
Skewness1.1295967
Sum25529
Variance138.30305
MonotonicityNot monotonic
2022-12-04T09:36:36.020112image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 72
 
9.4%
21 63
 
8.2%
25 48
 
6.2%
24 46
 
6.0%
23 38
 
4.9%
28 35
 
4.6%
26 33
 
4.3%
27 32
 
4.2%
29 29
 
3.8%
31 24
 
3.1%
Other values (42) 348
45.3%
ValueCountFrequency (%)
21 63
8.2%
22 72
9.4%
23 38
4.9%
24 46
6.0%
25 48
6.2%
26 33
4.3%
27 32
4.2%
28 35
4.6%
29 29
3.8%
30 21
 
2.7%
ValueCountFrequency (%)
81 1
 
0.1%
72 1
 
0.1%
70 1
 
0.1%
69 2
0.3%
68 1
 
0.1%
67 3
0.4%
66 4
0.5%
65 3
0.4%
64 1
 
0.1%
63 4
0.5%

outcome
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500 
1
268 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters768
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Length

2022-12-04T09:36:36.366897image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-04T09:36:36.646722image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring characters

ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 768
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring scripts

ValueCountFrequency (%)
Common 768
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Interactions

2022-12-04T09:36:27.855140image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:13.884633image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:15.983798image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:18.055079image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:20.007710image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:21.962006image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:23.877246image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:25.873160image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:28.114606image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:14.139339image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:16.253186image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:18.318514image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:20.267321image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:22.213165image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:24.137447image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:26.133670image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:28.377933image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:14.488680image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:16.537316image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:18.581557image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:20.528874image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:22.472050image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:24.407834image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:26.396869image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:28.628986image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:14.743733image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:16.789133image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:18.817436image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:20.777654image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:22.707857image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:24.652794image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:26.642249image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:28.862992image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:14.989835image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:17.040586image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:19.050836image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:21.004249image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:22.943150image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:24.885787image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:26.885935image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:29.091467image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:15.221838image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:17.272751image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:19.275869image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:21.228498image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:23.152676image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:25.116460image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:27.115652image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:29.341292image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:15.483155image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:17.545086image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:19.526525image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:21.479637image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:23.400119image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:25.374146image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:27.369170image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:29.590781image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:15.728615image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:17.795512image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:19.769620image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:21.720722image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:23.640567image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:25.628172image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-04T09:36:27.608882image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-12-04T09:36:36.843598image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-04T09:36:37.230360image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-04T09:36:37.619120image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-04T09:36:38.024867image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-04T09:36:38.396636image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-04T09:36:29.911041image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-04T09:36:30.284400image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

pregnantplasmabpskin_thicknessinsulinbmidiabetes_pedigree_functionageoutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
55116740025.60.201300
637850328831.00.248261
71011500035.30.134290
82197704554330.50.158531
9812596000.00.232541
pregnantplasmabpskin_thicknessinsulinbmidiabetes_pedigree_functionageoutcome
7581106760037.50.197260
7596190920035.50.278661
76028858261628.40.766220
76191707431044.00.403431
762989620022.50.142330
76310101764818032.90.171630
76421227027036.80.340270
7655121722311226.20.245300
7661126600030.10.349471
7671937031030.40.315230